gh-90536: Add support for the BOLT post-link binary optimizer #95908

kmod · 2022-08-11T22:05:12Z

Using bolt
provides a fairly large speedup without any code or functionality
changes. It provides roughly a 1% speedup on pyperformance, and a
4% improvement on the Pyston web macrobenchmarks.

It is gated behind an --enable-bolt configure arg because not all
toolchains and environments are supported. It has been tested on a
Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6
sources (their binary distribution of this version did not include bolt).

Compared to a previous attempt,
this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE
flags which enable much better optimizations from bolt.

The effects of this change are a bit more dependent on CPU microarchitecture
than other changes, since it optimizes i-cache behavior which seems
to be a bit more variable between architectures. The 1%/4% numbers
were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I
got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance
I got a slightly lower speedup (1%/3%).

The low speedup on pyperformance is not entirely unexpected, because
BOLT improves i-cache behavior, and the benchmarks in the pyperformance
suite are small and tend to fit in i-cache.

This change uses the existing pgo profiling task (python -m test --pgo),
though I was able to measure about a 1% macrobenchmark improvement by
using the macrobenchmarks as the training task. I personally think that
both the PGO and BOLT tasks should be updated to use macrobenchmarks,
but for the sake of splitting up the work this PR uses the existing pgo task.

Issue: Experiment with LLVM BOLT binary optimizer #90536

Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt) provides a fairly large speedup without any code or functionality changes. It provides roughly a 1% speedup on pyperformance, and a 4% improvement on the Pyston web macrobenchmarks. It is gated behind an `--enable-bolt` configure arg because not all toolchains and environments are supported. It has been tested on a Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6 sources (their binary distribution of this version did not include bolt). Compared to [a previous attempt](faster-cpython/ideas#224), this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE flags which enable much better optimizations from bolt. The effects of this change are a bit more dependent on CPU microarchitecture than other changes, since it optimizes i-cache behavior which seems to be a bit more variable between architectures. The 1%/4% numbers were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance I got a slightly lower speedup (1%/3%). The low speedup on pyperformance is not entirely unexpected, because BOLT improves i-cache behavior, and the benchmarks in the pyperformance suite are small and tend to fit in i-cache. This change uses the existing pgo profiling task (`python -m test --pgo`), though I was able to measure about a 1% macrobenchmark improvement by using the macrobenchmarks as the training task. I personally think that both the PGO and BOLT tasks should be updated to use macrobenchmarks, but for the sake of splitting up the work this PR uses the existing pgo task.

bedevere-bot · 2022-08-11T22:05:15Z

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

gvanrossum · 2022-08-11T23:21:48Z

Thanks! I hope @corona10 can review and merge this, and maybe @pablogsal will be willing to backport it to 3.11.

pablogsal · 2022-08-11T23:27:12Z

and maybe @pablogsal will be willing to backport it to 3.11.

Unfortunately, changes in the configure script or makefile are too much at this stage, especially for a new feature that has not been tested in the wild (by users checking the pre-releases). Sadly, this must go to 3.12.

corona10 · 2022-08-11T23:29:21Z

Nice work! I will take a look at this PR by this weekend

bedevere-bot · 2022-08-12T16:55:17Z

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

corona10

Two things need to be checked.

~~I failed to build the binary with this patch, This can be due to the BOLT bug but I would like to know which BOLT version you used.~~ -> solved

BOLT-INFO: Allocation combiner: 30 empty spaces coalesced (dyn count: 63791805).
 #0 0x0000563eb3e8d705 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000563eb3e8b2d4 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fc228930520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #3 0x0000563eb4ebd106 llvm::bolt::BinaryFunction::translateInputToOutputAddress(unsigned long) const (/usr/local/bin/llvm-bolt+0x1c52106)
 #4 0x0000563eb3f52658 llvm::bolt::DWARFRewriter::updateUnitDebugInfo(llvm::DWARFUnit&, llvm::bolt::DebugInfoBinaryPatcher&, llvm::bolt::DebugAbbrevWriter&, llvm::bolt::DebugLocWriter&, llvm::bolt::DebugRangesSectionWriter&, llvm::Optional<unsigned long>) (/usr/local/bin/llvm-bolt+0xce7658)
 #5 0x0000563eb3f5688b llvm::bolt::DWARFRewriter::updateDebugInfo()::'lambda0'(unsigned long, llvm::DWARFUnit*)::operator()(unsigned long, llvm::DWARFUnit*) const DWARFRewriter.cpp:0:0
 #6 0x0000563eb3f5c45a llvm::bolt::DWARFRewriter::updateDebugInfo() (/usr/local/bin/llvm-bolt+0xcf145a)
 #7 0x0000563eb3f1aef8 llvm::bolt::RewriteInstance::updateMetadata() (/usr/local/bin/llvm-bolt+0xcafef8)
 #8 0x0000563eb3f428e6 llvm::bolt::RewriteInstance::run() (/usr/local/bin/llvm-bolt+0xcd78e6)
 #9 0x0000563eb355ccf8 main (/usr/local/bin/llvm-bolt+0x2f1cf8)
#10 0x00007fc228917d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#11 0x00007fc228917e40 call_init ./csu/../csu/libc-start.c:128:20
#12 0x00007fc228917e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#13 0x0000563eb35dbd75 _start (/usr/local/bin/llvm-bolt+0x370d75)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /usr/local/bin/llvm-bolt python -o python.bolt -data=python.fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
make: *** [Makefile:800: bolt-opt] Segmentation fault (core dumped

While profiling, I met the test failure, would you like to check that the optimized binary pass all std python test? (e.g python -m test), I met the related issue with the last attempts and it was solved by profiling through python -m test -> solved

./python.bolt_inst -m test --pgo --timeout=1200 || true
0:00:00 load avg: 2.17 Run tests sequentially (timeout: 20 min)
0:00:00 load avg: 2.17 [ 1/44] test_array
0:00:01 load avg: 2.17 [ 2/44] test_base64
0:00:02 load avg: 2.07 [ 3/44] test_binascii
0:00:02 load avg: 2.07 [ 4/44] test_binop
0:00:02 load avg: 2.07 [ 5/44] test_bisect
0:00:02 load avg: 2.07 [ 6/44] test_bytes
0:00:06 load avg: 2.07 [ 7/44] test_bz2
0:00:06 load avg: 2.07 [ 8/44] test_cmath
0:00:07 load avg: 2.07 [ 9/44] test_codecs
0:00:08 load avg: 1.99 [10/44] test_collections
0:00:09 load avg: 1.99 [11/44] test_complex
0:00:10 load avg: 1.99 [12/44] test_dataclasses
0:00:10 load avg: 1.99 [13/44] test_datetime
0:00:14 load avg: 1.83 [14/44] test_decimal
0:00:18 load avg: 1.76 [15/44] test_difflib
0:00:19 load avg: 1.76 [16/44] test_embed
0:00:21 load avg: 1.76 [17/44] test_float
0:00:22 load avg: 1.76 [18/44] test_fstring
0:00:23 load avg: 1.70 [19/44] test_functools
0:00:23 load avg: 1.70 [20/44] test_generators
0:00:24 load avg: 1.70 [21/44] test_hashlib
0:00:25 load avg: 1.70 [22/44] test_heapq
0:00:26 load avg: 1.70 [23/44] test_int
0:00:26 load avg: 1.70 [24/44] test_itertools
0:00:32 load avg: 1.64 [25/44] test_json
0:00:36 load avg: 1.59 [26/44] test_long
0:00:39 load avg: 1.54 [27/44] test_lzma
0:00:39 load avg: 1.54 [28/44] test_math
0:00:42 load avg: 1.50 [29/44] test_memoryview
0:00:43 load avg: 1.50 [30/44] test_operator
0:00:44 load avg: 1.50 [31/44] test_ordered_dict
0:00:46 load avg: 1.50 [32/44] test_patma
0:00:46 load avg: 1.50 [33/44] test_pickle
0:00:52 load avg: 1.46 [34/44] test_pprint
0:00:52 load avg: 1.42 [35/44] test_re
0:00:53 load avg: 1.42 [36/44] test_set
0:01:00 load avg: 1.39 [37/44] test_sqlite3
0:01:05 load avg: 1.36 [38/44] test_statistics
0:01:10 load avg: 1.33 [39/44] test_struct
0:01:11 load avg: 1.33 [40/44] test_tabnanny
0:01:12 load avg: 1.30 [41/44] test_time
0:01:15 load avg: 1.30 [42/44] test_unicode
test test_unicode failed
0:01:17 load avg: 1.28 [43/44] test_xml_etree -- test_unicode failed (1 failure)
0:01:19 load avg: 1.28 [44/44] test_xml_etree_c

Total duration: 1 min 21 sec
Tests result: FAILURE

I will share further investigation into this patch.
FYI, this is my environment.

- OS: Ubuntu 22.04 LTS
- BOLT revision e9b213131ae9c57f4f151d3206916676135b31b0
- gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0

corona10 · 2022-08-13T11:33:43Z

Hmm, I will try to build BOLT from LLVM 14.0.6

corona10 · 2022-08-13T12:26:09Z

I found why the BOLT was failed, I will downgrade the gcc version into 10.


DWARF 5 has become the default in GCC 11

corona10

Thanks for work! All pipeline works correctly.

Please update https://github.com/python/cpython/blob/main/Doc/using/configure.rst too.
(If possible https://github.com/python/cpython/blob/main/Doc/whatsnew/3.12.rst too, I will update the whats new if you are too busy)
But please emphasize that this feature is experimental optimization support.

I am going to measure the performance enhancement soon through the pyperformance and also for the l1 i-cache miss ratio.

Looks like https://github.com/pyston/python-macrobenchmarks does not support Python 3.1[1-2] yet right? Please let me know if I know wrong.

plus
https://github.com/python/cpython/blob/main/Misc/ACKS Add your name in this file too :)

Makefile.pre.in

configure.ac

bedevere-bot · 2022-08-13T14:59:42Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

corona10 · 2022-08-13T16:18:09Z

@gvanrossum @kmod cc @markshannon

Interesting result!
The following benchmark was measured on AWS c5n.metal / gcc-10. (base commit: f235178)
I wish to re-measure the benchmark from the FasterCPython project machine also.
I am going to measure the L1 i-cache miss ratio soon where the perf tool is available.

Benchmark	CPython 3.12 ./configure --enable-optimizations --with-lto	CPython 3.12 ./configure --enable-optimizations --with-lto --enable-bolt
2to3	269 ms	255 ms: 1.05x faster
chameleon	7.39 ms	7.02 ms: 1.05x faster
chaos	74.1 ms	68.8 ms: 1.08x faster
crypto_pyaes	82.3 ms	77.2 ms: 1.07x faster
deltablue	3.65 ms	3.41 ms: 1.07x faster
django_template	38.6 ms	35.3 ms: 1.09x faster
dulwich_log	67.6 ms	58.7 ms: 1.15x faster
fannkuch	385 ms	380 ms: 1.02x faster
float	73.2 ms	72.4 ms: 1.01x faster
genshi_text	24.3 ms	23.3 ms: 1.04x faster
genshi_xml	56.4 ms	52.8 ms: 1.07x faster
go	140 ms	136 ms: 1.03x faster
hexiom	6.40 ms	6.25 ms: 1.02x faster
html5lib	65.0 ms	60.7 ms: 1.07x faster
json_dumps	11.1 ms	10.4 ms: 1.07x faster
json_loads	28.7 us	26.3 us: 1.09x faster
logging_format	7.29 us	6.69 us: 1.09x faster
logging_silent	101 ns	97.6 ns: 1.03x faster
logging_simple	6.48 us	6.01 us: 1.08x faster
mako	10.6 ms	9.91 ms: 1.07x faster
meteor_contest	106 ms	102 ms: 1.04x faster
nbody	86.4 ms	87.7 ms: 1.02x slower
nqueens	91.3 ms	88.1 ms: 1.04x faster
pathlib	19.0 ms	16.8 ms: 1.13x faster
pickle_dict	32.2 us	32.6 us: 1.01x slower
pickle_list	4.69 us	4.62 us: 1.02x faster
pickle_pure_python	297 us	282 us: 1.05x faster
pidigits	177 ms	176 ms: 1.01x faster
pyflate	423 ms	416 ms: 1.02x faster
python_startup	8.72 ms	8.15 ms: 1.07x faster
python_startup_no_site	6.35 ms	5.97 ms: 1.06x faster
raytrace	312 ms	293 ms: 1.06x faster
regex_compile	139 ms	131 ms: 1.06x faster
regex_dna	180 ms	185 ms: 1.03x slower
regex_effbot	2.99 ms	2.82 ms: 1.06x faster
regex_v8	21.4 ms	20.4 ms: 1.05x faster
richards	48.6 ms	46.3 ms: 1.05x faster
scimark_fft	348 ms	338 ms: 1.03x faster
scimark_lu	120 ms	117 ms: 1.02x faster
scimark_monte_carlo	67.0 ms	65.4 ms: 1.02x faster
scimark_sor	116 ms	113 ms: 1.02x faster
spectral_norm	101 ms	102 ms: 1.01x slower
sqlalchemy_declarative	143 ms	135 ms: 1.06x faster
sqlalchemy_imperative	19.0 ms	17.0 ms: 1.12x faster
sqlite_synth	2.50 us	2.29 us: 1.09x faster
sympy_expand	507 ms	465 ms: 1.09x faster
sympy_integrate	21.7 ms	20.5 ms: 1.06x faster
sympy_sum	176 ms	164 ms: 1.08x faster
sympy_str	311 ms	286 ms: 1.09x faster
telco	7.02 ms	6.36 ms: 1.10x faster
tornado_http	125 ms	113 ms: 1.10x faster
unpickle	15.7 us	15.1 us: 1.04x faster
unpickle_list	4.74 us	4.56 us: 1.04x faster
unpickle_pure_python	229 us	219 us: 1.05x faster
xml_etree_parse	158 ms	155 ms: 1.02x faster
xml_etree_iterparse	103 ms	101 ms: 1.02x faster
xml_etree_generate	91.0 ms	84.3 ms: 1.08x faster
xml_etree_process	61.9 ms	58.4 ms: 1.06x faster
Geometric mean	(ref)	1.05x faster

Benchmark hidden because not significant (3): pickle, scimark_sparse_mat_mult, unpack_sequence

corona10 · 2022-08-14T08:00:01Z

Another benchmark from Azure VM(Ubuntu 20.04.4 LTS gcc 9.4.0):
https://gist.github.com/corona10/c2aa0108a5ffcc96be449c0ce033412d

But let's measure the benchmark from the Faster CPython machine after the PR is merged.

corona10 · 2022-08-15T02:45:18Z

Makefile.pre.in

@@ -640,6 +640,15 @@ profile-opt: profile-run-stamp
 	-rm -f profile-clean-stamp
 	$(MAKE) @DEF_MAKE_RULE@ CFLAGS_NODIST="$(CFLAGS_NODIST) $(PGO_PROF_USE_FLAG)" LDFLAGS_NODIST="$(LDFLAGS_NODIST)"

+bolt-opt: @PREBOLT_RULE@
+	rm -f *.fdata
+	@LLVM_BOLT@ $(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst


Suggested change

@LLVM_BOLT@ $(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst

@LLVM_BOLT@ ./$(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst

corona10 · 2022-08-15T02:45:57Z

Makefile.pre.in

+	@LLVM_BOLT@ $(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst
+	./$(BUILDPYTHON).bolt_inst $(PROFILE_TASK) || true
+	@MERGE_FDATA@ $(BUILDPYTHON).*.fdata > $(BUILDPYTHON).fdata
+	@LLVM_BOLT@ $(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot


Suggested change

@LLVM_BOLT@ $(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot

@LLVM_BOLT@ ./$(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot

corona10 · 2022-08-15T04:45:49Z

I success to get cache miss-related metadata and also I got the pyperformance result which is similar to my previous attempts and Kevin's report.
I didn't analyze whether the GCC version or OS version could affect the performance result.
But I can conclude that BOLT definitely makes CPython faster.

Environment

Hardware: AWS c5n.metal
Red Hat Enterprise Linux release 8.6 (Ootpa)
gcc: gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
LLVM version 14.0.6

Binary Size

Without BOLT: 79M
With BOLT: 36M

ICache miss

Experiment	instructions	L1-icache-misses	ratio
PGO + LTO	8,330,863,079,932	77,047,357,163	0.92%
PGO + LTO + BOLT	8,312,698,165,975	65,319,225,064	0.79%

Benchmark (1.01x faster)

https://gist.github.com/corona10/5726d1528176677d4c694265edfc4bf5

Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>

aaupov · 2022-08-18T22:29:25Z

Hi,

One thing to note is that Python program may spend considerable time in shared libraries corresponding to modules, and BOLT has an ability to optimize them as well. I would suggest profiling benchmarks with perf and optimizing .so's as well.

corona10 · 2022-08-18T22:32:50Z

@aaupov I would like to recommend creating issue for your suggestions on https://github.com/faster-cpython/ideas or https://github.com/python/cpython/issues. I think that is faster-cpython repo is more proper :)

Using the patch adapted from python/cpython#95908

osevan · 2022-12-18T23:03:40Z

Another question and important view of performance tuning.

Gcc pgo and clang pgo are different , and gcc pgo profiler like profile-generate, can get more deeply data for pgo, instead of clang profile-generate.

So, would be nice to make new flags with

--enable-lto-gcc --enable-pgo-gcc,but considering at gcc level reorder flag needing for BOLT at clang

bolting

And one compilechain completely in clang
--enable-lto-llvm --enable-pgo-llvm plus bolt

Thank you very much

Using the patch adapted from python/cpython#95908

bedevere-bot added the awaiting review label Aug 11, 2022

corona10 self-requested a review August 11, 2022 23:28

corona10 changed the title ~~Add support for the BOLT post-link binary optimizer~~ gh-90536: Add support for the BOLT post-link binary optimizer Aug 11, 2022

Simplify the build flags

1448a68

Add a NEWS entry

c546374

This comment was marked as resolved.

Sign in to view

corona10 reviewed Aug 13, 2022

View reviewed changes

corona10 requested changes Aug 13, 2022

View reviewed changes

Makefile.pre.in Show resolved Hide resolved

configure.ac Outdated Show resolved Hide resolved

bedevere-bot removed the awaiting review label Aug 13, 2022

bedevere-bot added the awaiting changes label Aug 13, 2022

corona10 self-assigned this Aug 13, 2022

corona10 reviewed Aug 15, 2022

View reviewed changes

kmod and others added 6 commits August 16, 2022 17:31

Update Makefile.pre.in

c12dbea

Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>

Update configure.ac

ce25757

Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>

Add myself to ACKS

ded38f0

Add docs

cc17806

Other review comments

0050190

fix tab/space issue

83da8c4

aaupov mentioned this pull request Aug 18, 2022

Optimizing module shared libraries with BOLT faster-cpython/ideas#449

Open

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Nov 16, 2022

[DEV-5385] Build CPython using BOLT

4938e5a

Using the patch adapted from python/cpython#95908

vmarkovtsev mentioned this pull request Nov 16, 2022

[DEV-5385] Build CPython using BOLT athenianco/athenian-api#3040

Merged

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Nov 16, 2022

[DEV-5385] Build CPython using BOLT

6e7c370

Using the patch adapted from python/cpython#95908

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Nov 16, 2022

[DEV-5385] Build CPython using BOLT

25c9972

Using the patch adapted from python/cpython#95908

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Nov 16, 2022

[DEV-5385] Build CPython using BOLT

b71c8d7

Using the patch adapted from python/cpython#95908

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Nov 16, 2022

[DEV-5385] Build CPython using BOLT

77c21d0

Using the patch adapted from python/cpython#95908

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Nov 16, 2022

[DEV-5385] Build CPython using BOLT

f23b3d7

Using the patch adapted from python/cpython#95908

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Nov 16, 2022

[DEV-5385] Build CPython using BOLT

a276be0

Using the patch adapted from python/cpython#95908

vmarkovtsev mentioned this pull request Nov 16, 2022

BOLT segfaults on optimizing CPython llvm/llvm-project#59025

Closed

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Dec 20, 2022

[DEV-5385] Build CPython using BOLT

f37d126

Using the patch adapted from python/cpython#95908

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Dec 20, 2022

[DEV-5385] Build CPython using BOLT

28da5b8

Using the patch adapted from python/cpython#95908

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Dec 20, 2022

[DEV-5385] Build CPython using BOLT

040f302

Using the patch adapted from python/cpython#95908

vmarkovtsev added a commit to athenianco/athenian-api that referenced this pull request Dec 20, 2022

[DEV-5385] Build CPython using BOLT

0640bf1

Using the patch adapted from python/cpython#95908

corona10 mentioned this pull request Jan 18, 2023

[WIP] Various build system improvements #101093

Draft

zamazan4ik mentioned this pull request Dec 4, 2023

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) for C++ part and LLVM clasp-developers/clasp#1526

Open

zamazan4ik mentioned this pull request Jan 11, 2024

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) shady-gang/shady#18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-90536: Add support for the BOLT post-link binary optimizer #95908

gh-90536: Add support for the BOLT post-link binary optimizer #95908

kmod commented Aug 11, 2022 •

edited by bedevere-bot

Loading

bedevere-bot commented Aug 11, 2022

gvanrossum commented Aug 11, 2022

pablogsal commented Aug 11, 2022

corona10 commented Aug 11, 2022

bedevere-bot commented Aug 12, 2022

This comment was marked as resolved.

corona10 left a comment •

edited

Loading

corona10 commented Aug 13, 2022

corona10 commented Aug 13, 2022 •

edited

Loading

corona10 left a comment •

edited

Loading

bedevere-bot commented Aug 13, 2022

corona10 commented Aug 13, 2022 •

edited

Loading

corona10 commented Aug 14, 2022 •

edited

Loading

corona10 Aug 15, 2022

corona10 Aug 15, 2022

corona10 commented Aug 15, 2022 •

edited

Loading

aaupov commented Aug 18, 2022

corona10 commented Aug 18, 2022

osevan commented Dec 18, 2022 •

edited

Loading

	@LLVM_BOLT@ $(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst
	@LLVM_BOLT@ ./$(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst

	@LLVM_BOLT@ $(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
	@LLVM_BOLT@ ./$(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot

gh-90536: Add support for the BOLT post-link binary optimizer #95908

gh-90536: Add support for the BOLT post-link binary optimizer #95908

Conversation

kmod commented Aug 11, 2022 • edited by bedevere-bot Loading

bedevere-bot commented Aug 11, 2022

gvanrossum commented Aug 11, 2022

pablogsal commented Aug 11, 2022

corona10 commented Aug 11, 2022

bedevere-bot commented Aug 12, 2022

This comment was marked as resolved.

corona10 left a comment • edited Loading

Choose a reason for hiding this comment

corona10 commented Aug 13, 2022

corona10 commented Aug 13, 2022 • edited Loading

corona10 left a comment • edited Loading

Choose a reason for hiding this comment

bedevere-bot commented Aug 13, 2022

corona10 commented Aug 13, 2022 • edited Loading

corona10 commented Aug 14, 2022 • edited Loading

corona10 Aug 15, 2022

Choose a reason for hiding this comment

corona10 Aug 15, 2022

Choose a reason for hiding this comment

corona10 commented Aug 15, 2022 • edited Loading

Environment

Binary Size

ICache miss

Benchmark (1.01x faster)

aaupov commented Aug 18, 2022

corona10 commented Aug 18, 2022

osevan commented Dec 18, 2022 • edited Loading

kmod commented Aug 11, 2022 •

edited by bedevere-bot

Loading

corona10 left a comment •

edited

Loading

corona10 commented Aug 13, 2022 •

edited

Loading

corona10 left a comment •

edited

Loading

corona10 commented Aug 13, 2022 •

edited

Loading

corona10 commented Aug 14, 2022 •

edited

Loading

corona10 commented Aug 15, 2022 •

edited

Loading

osevan commented Dec 18, 2022 •

edited

Loading